Spoken Interface for Correcting Phoneme Recognition Errors in Learning of Unknownwords

نویسندگان

  • Xiang Zuo
  • Taisuke Sumii
  • Naoto Iwahashi
  • Mikio Nakano
  • Kotaro Funakoshi
  • Natsuki Oka
چکیده

This paper describes a novel method that enables users to teach systems the phoneme sequences of new words through speech interaction. Using the method, users can correct mis-recognized phoneme sequences incrementally by making corrective utterances. Each corrective utterance may include the whole or a segment of the word. During the interaction, if the correction using the utterance results in a better phoneme sequence than the previous one, a user can stop the interaction or make a corrective utterance again. Otherwise the user can reject the utterance. The originalities of this method are 1) interactive correction by speech, 2) the use of spoken word segments for locating mis-recognized phonemes and, 3) the use of generalized posterior probability (GPP) as a measure of correcting mis-recognized phonemes. The experimental results show that the proposed method achieved 96.8% in phoneme accuracy and 79.1% in word accuracy, with less than seven corrective utterances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Influences of spoken word planning on speech recognition.

In 4 chronometric experiments, influences of spoken word planning on speech recognition were examined. Participants were shown pictures while hearing a tone or a spoken word presented shortly after picture onset. When a spoken word was presented, participants indicated whether it contained a prespecified phoneme. When the tone was presented, they indicated whether the picture name contained the...

متن کامل

Recurrent Neural Network-Based Phoneme Sequence Estimation Using Multiple ASR Systems' Outputs for Spoken Term Detection

This paper describes a novel correct phoneme sequence estimation method that uses a recurrent neural network (RNN)-based framework for spoken term detection (STD). In an automatic speech recognition (ASR)-based STD framework, ASR performance (word or subword error rate) affects STD performance. Therefore, it is important to reduce ASR errors to obtain good STD results. In this study, we use an ...

متن کامل

Fast Approximate Spoken Term Detection from Sequence of Phonemes

We investigate the detection of spoken terms in conversational speech using phoneme recognition with the objective of achieving smaller index size as well as faster search speed. Speech is processed and indexed as a sequence of one best phoneme sequence. We propose the use of a probabilistic pronunciation model for the search term to compensate for the errors in the recognition of phonemes. Thi...

متن کامل

Evaluation of DNN-based Phoneme Estimation Approach on the NTCIR-12 SpokenQuery&Doc-2 SQ-STD Subtask

This paper proposes a correct phoneme sequence estimation method using a deep neural network (DNN)-based framework for spoken term detection (STD). We use a DNN architecture as a correct phoneme estimator. The DNN-based estimator estimates a correct phoneme sequence of an utterance from some sorts of phoneme-based transcriptions produced by multiple ASR systems in post-processing, for reducing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011